Picture for Han Shen

Han Shen

Light Alignment Improves LLM Safety via Model Self-Reflection with a Single Neuron

Add code
Feb 02, 2026
Viaarxiv icon

On Entropy Control in LLM-RL Algorithms

Add code
Sep 03, 2025
Viaarxiv icon

Fundamental Safety-Capability Trade-offs in Fine-tuning Large Language Models

Add code
Mar 24, 2025
Viaarxiv icon

Mitigating Forgetting in LLM Supervised Fine-Tuning and Preference Learning

Add code
Oct 20, 2024
Figure 1 for Mitigating Forgetting in LLM Supervised Fine-Tuning and Preference Learning
Figure 2 for Mitigating Forgetting in LLM Supervised Fine-Tuning and Preference Learning
Figure 3 for Mitigating Forgetting in LLM Supervised Fine-Tuning and Preference Learning
Figure 4 for Mitigating Forgetting in LLM Supervised Fine-Tuning and Preference Learning
Viaarxiv icon

SEAL: Safety-enhanced Aligned LLM Fine-tuning via Bilevel Data Selection

Add code
Oct 09, 2024
Figure 1 for SEAL: Safety-enhanced Aligned LLM Fine-tuning via Bilevel Data Selection
Figure 2 for SEAL: Safety-enhanced Aligned LLM Fine-tuning via Bilevel Data Selection
Figure 3 for SEAL: Safety-enhanced Aligned LLM Fine-tuning via Bilevel Data Selection
Figure 4 for SEAL: Safety-enhanced Aligned LLM Fine-tuning via Bilevel Data Selection
Viaarxiv icon

Principled Penalty-based Methods for Bilevel Reinforcement Learning and RLHF

Add code
Feb 10, 2024
Figure 1 for Principled Penalty-based Methods for Bilevel Reinforcement Learning and RLHF
Figure 2 for Principled Penalty-based Methods for Bilevel Reinforcement Learning and RLHF
Figure 3 for Principled Penalty-based Methods for Bilevel Reinforcement Learning and RLHF
Figure 4 for Principled Penalty-based Methods for Bilevel Reinforcement Learning and RLHF
Viaarxiv icon

Joint Unsupervised and Supervised Training for Automatic Speech Recognition via Bilevel Optimization

Add code
Jan 13, 2024
Figure 1 for Joint Unsupervised and Supervised Training for Automatic Speech Recognition via Bilevel Optimization
Figure 2 for Joint Unsupervised and Supervised Training for Automatic Speech Recognition via Bilevel Optimization
Figure 3 for Joint Unsupervised and Supervised Training for Automatic Speech Recognition via Bilevel Optimization
Figure 4 for Joint Unsupervised and Supervised Training for Automatic Speech Recognition via Bilevel Optimization
Viaarxiv icon

On Penalty-based Bilevel Gradient Descent Method

Add code
Feb 10, 2023
Figure 1 for On Penalty-based Bilevel Gradient Descent Method
Figure 2 for On Penalty-based Bilevel Gradient Descent Method
Figure 3 for On Penalty-based Bilevel Gradient Descent Method
Figure 4 for On Penalty-based Bilevel Gradient Descent Method
Viaarxiv icon

Alternating Implicit Projected SGD and Its Efficient Variants for Equality-constrained Bilevel Optimization

Add code
Nov 14, 2022
Viaarxiv icon

Mitigating Gradient Bias in Multi-objective Learning: A Provably Convergent Stochastic Approach

Add code
Oct 23, 2022
Viaarxiv icon